ARTICLE Prediction of Protein Solubility in Escherichia coli Using Logistic Regression
نویسندگان
چکیده
In this article we present a new and more accurate model for the prediction of the solubility of proteins overexpressed in the bacterium Escherichia coli. The model uses the statistical technique of logistic regression. To build this model, 32 parameters that could potentially correlate well with solubility were used. In addition, the protein database was expanded compared to those used previously. We tested several different implementations of logistic regression with varied results. The best implementation, which is the one we report, exhibits excellent overall prediction accuracies: 94% for the model and 87% by crossvalidation. For comparison, we also tested discriminant analysis using the same parameters, and we obtained a less accurate prediction (69% cross-validation accuracy for the stepwise forward plus interactions model). Biotechnol. Bioeng. 2009;9999: 1–10. 2009 Wiley Periodicals, Inc.
منابع مشابه
Prediction of protein solubility in Escherichia coli using logistic regression.
In this article we present a new and more accurate model for the prediction of the solubility of proteins overexpressed in the bacterium Escherichia coli. The model uses the statistical technique of logistic regression. To build this model, 32 parameters that could potentially correlate well with solubility were used. In addition, the protein database was expanded compared to those used previou...
متن کاملPrediction of Protein Solubility in Escherichia Coli Using Discriminant Analysis, Logistic Regression, and Artificial Neural Network Models
Recombinant DNA technology is important in the mass production of proteins for academic, medical, and industrial use, and the prediction of the solubility of proteins is a significant part of it. However, the protein solubility when overexpressed in a host organism is difficult to predict. Thus, a model capable of accurately estimating the likelihood of proteins to form insoluble inclusion bodi...
متن کاملBioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction
The solubility of recombinant protein expressed in Escherichia coli often represents the production yield. However, up-to-date, instances of successful production of soluble recombinant proteins in E. coli expression system with high yield remain scarce. This is mainly due to the difficulties in improving the overall production capacity, as most of the well-established strategies usually involv...
متن کاملFuzzy Hybrid least-Squares Regression Approach to Estimating the amount of Extra Cellular Recombinant Protein A from Escherichia coli BL21
Introduction: Immune Protein A is a component with a vast spectrum of biochemical, biological and medical usages. The coding gene of this protein was extracted from Staphylococcus aureus and was cloned and expressed in Escherichia coli bacteria. Suitable statistical methods are utilized to optimize expression conditions for evaluating experiment accuracy , guarantee the accuracy of subsequent ...
متن کاملEnhancement of Solubility and Specific Activity of a Cu/Zn Superoxide Dismutase by Co-expression with a Copper Chaperone in Escherichia coli
Background: Human Cu/Zn superoxide dismutase (hSOD1) is an antioxidant enzyme with potential as a therapeutic agent. However, heterologous expression of hSOD1 has remained an issue due to Cu2+ insufficiency at protein active site, leading to low solubility and enzymatic activity.Objectives:The effect of co-expressed human copper chaperone (hCCS) to enhance the solubility and enzymatic act...
متن کامل